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Abstract 



Functional linear regression has recently attracted considerable interest. Many works focus on asymp- 
totic inference. In this paper we consider in a non asymptotic framework a simple estimation procedure 
based on functional Principal Regression. It revolves in the minimization of a least square contrast coupled 
with a classical projection on the space spanned by the m first empirical eigenvectors of the covariance 
operator of the functional sample. The novelty of our approach is to select automatically the crucial di- 
mension m by minimization of a penalized least square contrast. Our method is based on model selection 
tools. Yet, since this kind of methods consists usually in projecting onto known non-random spaces, we 
need to adapt it to empirical eigenbasis made of data-dependent - hence random - vectors. The resulting 
estimator is fully adaptive and is shown to verify an oracle inequality for the risk associated to the pre- 
diction error and to attain optimal minimax rates of convergence over a certain class of ellipsoids. Our 
strategy of model selection is finally compared numerically with cross-validation. 

AMS subject classification: Primary, 62J05; Secondary, 62G08. 

Keywords. Functional linear regression, functional principal components analysis, mean squared prediction 
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Introduction 

Functional data analysis has known recent advances in the past two decades, addressing simultaneously many 
fields of applications. We refer to Ferraty and Vieu [20 and Ramsay and Silverman [32j for detailed examples 
in medicine, linguistics and chemometrics and to Preda and Saporta [30J for applications in econometrics. 

In this paper we suppose that the dependence between a real- valued response Y and a functional predictor 
X belonging to a Hilbert space (H, < •, • >,|| • ||) is given by the functional linear model, namely 



where e stands for a noise term with variance a 2 and is independent of X and /3 G H is an unknown function 
to be estimated. In order to simplify the notations, the random variable X is supposed to be centred as 
well, which means that the function 1 1— > E[X(i)] is identically equal to zero. 

By multiplying both sides of Equation ([I]) by X(s) and taking the expectation, we see easily that the 
function j3 is solution of 



where T is the covariance operator associated to the functional predictor X. Equation ([2]) is known to be 
an ill-posed inverse problem (see Engl et al. [19, Chapter 2.1]). 

The literature on the functional linear model is wide and numerous estimation procedures exist. A first 
method consists in minimizing a least square criterion subject to a roughness penalty. For instance, Li and 



Y =< 0,X > +e 
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Hsing [26j proposed an estimation procedure by minimization of such a criterion on periodic Sobolev spaces, 
Crambes et al. |16j generalized the well-known smoothing-spline estimator used in univariate nonparametric 
regression. Another approach is based on dimension reduction: this consists in approximating the regression 
function /3 by projection onto finite-dimensional spaces. Those spaces are usually obtained by taking the 
first components of a basis of H. Some authors considered projection onto fixed basis, such as B-spline basis 
(Ramsay and Dalzell [31J ) or general orthonormal basis (Cardot and Johannes [12] ). But the most popular 
method is Functional Principal Component Regression (FPCR), this consists in taking the random space 
spanned by the eigenfunctions associated to the largest eigenvalues of the empirical covariance operator: 



The resulting estimator is shown to be consistent, but its behaviour is often erratic in simulation studies, 
thus a smooth version by using splines has been proposed by Cardot et al. [llj. The FPCR estimator is 
shown to attain optimal rates of convergence for the risk associated to the prediction error over fixed curves 
x (see Cai and Hall [8]) as well as for the L 2 -risk (see Hall and Horowitz |22j). 

All the proposed estimators rely on the choice of at least one tuning parameter (the smoothing parameter 
appearing in the penalized criterion or the dimension of approximation space) which influences significantly 
the quality of estimation. Optimal choice of such parameters depends generally on both unknown regularities 
of the slope function /3 and the predictor X (see e.g. [U [TBI H2j ) and the parameters are usually chosen in 
practice by cross-validation. 

Until the recent work of Comte and Johannes [H], nonasymptotic results providing adaptive data- 
driven estimators were missing. Comte and Johannes |14[ [T5] propose model selection procedures for the 
orthogonal series estimator introduced first by Cardot and Johannes [12] . In |14j . they propose to select the 
dimension by minimization of a penalized contrast criterion under strong assumption of periodicity of the 
curve X while in [15J they define a dimension selection criterion by means of a stochastic penalized contrast 
emulating Lepski's method (see Goldenshluger and Lepski |21| ) and do not require specific assumptions on 
the curve X. The resulting estimators are completely data-driven and achieve optimal minimax rates for 
general weighted L 2 -risks. However, since both dimension selection criteria depend on weights defining the 
risk, these selection procedures do not address prediction error, which can be written as a weighted norm 
whose weights are the unknown eigenvalues of the covariance operator. 

In the same context as Comte and Johannes [J3], Brunei and Roche [7] propose to estimate the slope 
function by minimizing a least square contrast on spaces spanned by the trigonometric basis. The dimension 
is selected by means of a penalized contrast. Their estimator is proved to attain the optimal minimax rate 
of convergence for the risk associated to the prediction error. 

Another approach is proposed by Cai and Yuan [9] carrying out reproducing kernel Hilbert spaces. 
They develop a data-driven choice of the tuning parameter of the roughness regularization method (see e.g. 
Ramsay and Silverman [32]). Their estimation procedure is shown to attain the optimal rate of convergence 
without the need of knowing the covariance kernel. Lee and Park |25J also suggest general variable selection 
procedures based on a weighted L\ penalty under assumption of sparsity on the functional parameter j3. 
Their estimator is shown to be consistent and to satisfy the oracle-property. 

In this paper, we propose an entirely data-driven procedure to select the adequate dimension for the 
classical FPCR estimator. The method proposed is based on model selection tools developed in a general 
context by Barron et al. [4J, outlined by Massart [29], and in a context of regression by Baraud [21 [3]. 
However, these tools are not meant to deal with estimators defined on random approximation spaces and 
thus have to be adapted. Section [T] is devoted to the description of estimation procedure. The resulting 
estimator is proved to satisfy an oracle-type inequality and to attain the optimal minimax rate of convergence 
for the risk associated to the prediction error for slope functions belonging to Sobolev classes in Section [2] 
In Section [3] a simulation study is presented including a comparison with cross-validation. The proofs are 
detailed in Section [4] and in the Appendix. 
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1 Definition of the estimator 



We assume that we are given an i.i.d. sample (Y^,Xj)j>i where the generic Y is real and X belongs to the 
Hilbert space M. Thereafter, the Hilbert space is set to be H = L 2 ([0, 1]) equipped with its usual inner 
product < -j • > defined by < /, g >= f Q f(t)g(t)dt but our method adapts to more general Sobolev spaces 
as well. We assumed above that X is a centred random curve. 

We recall that the theoretical covariance operator r of X defined by Equation Q in the introductory 
section is a selfadjoint trace class operator defined on and with values in L 2 ([0, 1]). This means that 
the sequence of its eigenvalues denoted (Xj)j>i is positive and summable. The associated sequence of 
eigenfunctions is denoted by (ipj)j>i- 



1.1 Collection of models 

If the (ipj)j<i were known an obvious choice would be to consider a model collection (S m ) m based on 
these eigenfunctions. Unfortunately this is not possible. As the empirical covariance operator T n defined 
by Equation ^ is selfadjoint too, there exists an orthonormal basis (V^)j>i of L 2 ([0, 1]) composed of 
eigenfunctions of T n ; we denote by (Aj)j>i the associated eigenvalues arranged in decreasing order. Since 
T n is finished-rank, the (Xj)j>i are necessarily null at least for j > n. The couples fy,i>j)j>i are the 
empirical counterparts of the {\j,ipj)j>i. Dimension reduction based on functional Principal Component 
Analysis usually comes down to projecting the data on the space spanned by the {^j)j<K for some K. Our 
aim here is to shift to a model selection approach. 

Let N n be a random integer which will be defined later. For all m £ M n := {1, ...,N n }, we define: 

S m := span{^i,...,^ m } 

and the vector space S m is an empirical counterpart of S m : — span {ipi, ipm}- It is important to note that 
a major difference appears here. Classical model selection is carried out with fixed and known families of 
model. Here we handle random bases and this is the source of additional problems related to the convergence 
of (possibly random) projectors associated to these finite-dimensional spaces. Other difficulties come from 
the non-linear dependence between the coefficients of our estimator in the basis (ipj, ....,ip m ) and the basis 
itself. 



1.2 Estimation on S m 

Introduce the following simple least square contrast: 



1 n 

'»(*) ■=-J2(Yi-<t,X i >) 



n 

i=l 



Define g := Y2i=i YXi the cross-covariance between Y and X (which is also the theoretical counterpart 

of the function g appearing in Equation ([2])) and 



j=i x i 

We can see easily that Q is the unique minimizer of the least square contrast 7 ra if A m > 0. 

From the noise variance is as a parameter that we cannot dismiss since it appears in many compu- 
tations. We distinguish two cases below. 
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1.3 Model selection with known noise variance 

We suppose as a first step that the noise variance a 2 is known. 
We set 



N n : = max jiV 6N*,JV< 20yri/ln 3 (n) and X N > s n } , 
where s n := ( 1 — ', and its theoretical counterpart 



N n := max jiV G N*,N < 20yn/ln 3 (n) and X N > iT 2 

The introduction of an empirical maximal dimension is motivated by the need to ensure that the terms 
Xj appearing in the definition of our estimator are not too small. 
We select the dimension rrS kv ^ G Ai n by minimizing the criterion 

crit(m) = 7„(/3 m ) + pen (to) (m) (5) 

with 



2 

pen( kv \m) := (1 + 0)— m, (6) 



n 

where is a positive constant. Then, we propose the following estimator of the function j3 



)■ 

1.4 Model selection with unknown noise variance 

Let > 4 and (5 > 0, we set 

N n := max jiV G N*, iV < min^Oy^/ ln3 (™), n/9(l + 25)} and A w > s r 

and 



iV n := maxjiV eN*,JV< min{20y n/ ln 3 (n), n/9(l + 25)} and Ajy > n -2 

the term <r 2 appearing in Equation ^ is replaced by the following estimator 

1 n 

*m ■= ~ YjXi- < P™> X * >) 2 = 1n{Pm)- 
n — ^ 

The penalty becomes: 



n 

i=l 



pen(m) := 0(1 + <5)<r m — , 
n 



and the selection criterion: 

rh {uv) G argmin mg ^( 7n (^ m ) + pi(m)) = arg min me ^ 7n (/3 m ) (l + 0(1 + <5)^ ) . (7) 
We also denote 



pen(™)(m) := 0(1 + 5)a 2 -, 

n 



the theoretical counterpart of pen(m). 

Finally, we define the following estimator: 

In the sequel, when a property applies to both (3( kv > and f3^ wv > we denote simply these estimators by j3, 
in that case we will denote also, m^ kv ^ and m^ uv ' by rh and pen^ kv \m) and pen^ uv \m) by pen(m). 
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2 Main results 



In this section we derive oracle-type inequalities and uniform bounds for the risk associated to the prediction 
error. The prediction error of an estimator (3 (see e.g. [161 112j) is defined by 



E 



Yn+i — ^[Y n +i\X n+ i] I \X\,...,X. 



ir 1 / 2 ^ 



(8) 



where Y n+ \ := f3(t)X n+ \(t)dt. We suppose that Xj > for all j > 1, which implies that the quantity ([8| 
defines a norm on L 2 ([0, 1]) denoted by || • || 2 . This condition is necessary for the model to be identifiable. 
Indeed, if there exists jo > 1 such that Xj = 0, we have: 



= A 



JO II VjQ I 



=< r ^o»^io >=e[< x,il>j > 2 ] 



and < X,if)j >= almost surely. By consequence, if the slope function (3 satisfies Equation ([!]), then any 
slope function of the form /3 + c^ , with c£l, satisfies also Equation ([!]): it is clearly impossible to identify 
the slope function with our sample in that case. However this condition is not sufficient, for more details on 
the problem of identifiability in functional linear models see Section 2 of Cardot et al. 

2.1 Assumptions 

Recall that (Xj, ipj)j>2 denote the eigenelements of the covariance operator T. We can control the risk under 
four assumptions: 

HI There exists p > 4 such that t p := E[|e| p ] < +oo. 
H2 There exists b > such that, for all I & W, 

~< X,^ > 21 



supE 



X 1 : 



i-i 



H3 For all j ^ k, < X, ipj > is independent of < X, ipk >■ 

H4 There exists a constant 7 > such that the sequence (jXj ln 1+7 (j)) . >2 is decreasing. 

Assumption HI is standard in regression. Assumption H2 is necessary to apply exponential inequalities. 
Assumption H3 is also classical and we know from the Karhunen-Loeve decomposition of X that it is true 
for X a Gaussian process (see [1] Section 1.4]). Moreover, note that for every general random variables 
X £ L 2 ([0,1]), the random variables < X,ipj > and < X,ip^ > are uncorrelated since if j 7^ k, E[< 
ipj,Xi >< i(^k,Xi >} =< r-ipjjipk >= 0. The assumption on the sequence (jXj ln 1+7 (j))^. >2 allows to avoid 
more restrictive hypotheses about spacing control between eigenvalues as usually made frequently in the 
literature (see [8], [23], [22]). 

In order to derive oracle-inequalities for the risk associated to the prediction error, we need to precise 
the decreasing rate of the sequence (Xj)j>\. Usually in functional linear regression, this rate is supposed to 
be polynomial (see for instance |8lll61l9]) but more regular processes may be considered. That is the reason 
why, following Cardot and Johannes [12J or Comte and Johannes [H], we consider also exponential rates. 



(P) Polynomial decrease There exists two constants a > 1 and cp > 1 such that, for all j > 1 

cfr 11 < Aj < c P r a . 

(E) Exponential decrease There exists two constants a > and ce > 1 such that for all j > 1 

c E l exp(-f) < Xj < c E exp(-f). 
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2.2 Upper-bound on the empirical risk 

We define an empirical semi-norm naturally associated to our estimation problem by 



l n : = iiryvii 2 = < f> x * >2 > for a11 / e l2 ([°< 1 D- 



i=l 



In a first step, in propositions [T] and [2j we prove that our estimators verify an oracle type inequality for 
the risk associated to this semi-norm whatever the regularity of the slope function /3 and the decreasing rate 
of the covariance operator eigenvalues are. 



2.2.1 Bound on the empirical risk with known noise variance 
Proposition 1. Suppose that Assumption HI is fulfilled, we have 

E[\\P^ - Pf r j <C inf {E[||/3-n m /3||2j +P en( to )(m)} 



C'(a 2 + 



+ 



n 



with C, C > depending only on 9 and p and IL m the orthonormal projector onto S m . 

Proof. We want here to take advantage of Corollary 3.1 in Baraud [2] who provided a very similar result in 
the context of regression on a fixed design. Indeed Baraud considers a model Y = s(x) + e where Y is real, 
x takes values in some measurable space and s is a general mapping. Conditioning our functional linear 
model with respect to X := {Xi, ...,X n } we can switch from the model considered in Baraud to ours by 
setting s(x) = {(3,x). The seminorm || • || n becomes our || • ||p n and the least square contrast is now : 



1 n 
n f— ' 



8=1 



Our last task consists in identifying the class of models, namely the S^s. Still sticking to Baraud's notation 
the latter should be subspaces of L 2 (H, ||-||„). It is simple to see that the following collection suits: 

S m := span j-i/^-, j = 1, C L 2 (M, ||-||J ,m = 1, ...,N n , 



through the identification mentioned above that is ipj(x) = (^ipj,xj. As Assumption HI is supposed to be 
verified, we are now ready to apply Corollary 3.1 of Baraud [2] with q = 1 and obtain that a.s. 

6 



An||„ <C{0) inf_ 
rri£Mr, 



n 



+ pen(m) ) H -a 2 , 

n 



(9) 



with 



e p : = 0(e, P )% ( i + E m ~ iP/2 ~ 2) } z c"(e, P )^ p , 



(TP' 



and Ex denotes the conditional expectation with respect to X. 
Noticing that we set earlier IT a = II m , Equation (9) leads to : 

•Jin 



"iiir„ 



< C(9) ini_ 



2 A C"(6,p) 2 
| rn + pen(m) J + a . 



n 



Now, we must ensure that the dimension of the oracle (i.e. the dimension that realises the best bias- variance 
compromise in A4 n ) is included in Ai n . Remark that, if m > N n , 

m) 

3>Nn 



fl^J\\ 2 r n + pen(iV n ) < ^ h < >" +Pen(m) + ^^ (N n 
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moreover 



£ \j < /3, 4>j > 2 = || 
j>N n +l 

since, for all j > N n , Xj < s n . Therefore 



nw3|lr„+ E ^ </3,^- > 2 < ||/3-n m /3||2 n+Sr 



% n /3||r„ + Pen(iV n ) < ||/3 - tl m P\\l n + pen(m) + (? , 



and we obtain, for all m € M n 



E x 



An[|r„ 



<C(e){\\/3-U m p\\ 2 r +pen(m)) + 



C"(9,p) 



(10) 



The proof is completed by taking expectation on both sides of the last inequality. 

2.2.2 Bound on the empirical risk with unknown noise variance 
Proposition 2. Suppose that Assumption HI is fulfilled. We have 



□ 



(ill)) 



\IJ<C inf {E[||/3-fl m /3||2j+pen(™)(m)} + ^( 



o 2 + 



with C, C > depends only on 9, p and 5. 



Proof. Because of the random penalty, we cannot proceed as in the proof of Proposition [T] The following 
proof is based on contrast decomposition and control of the remaining empirical process. More precisely, by 
definitions of rhS uv > and /3 m : 



and 
with: 



7n(£ M ) - 7n(n m /3) < pen(m) - pen(m^), 

in0 {uv) ) - 7n(n m /3) = 11/3 - P {uv) \\l n - W - n^HL + 2^(n m /3 - p^), 

i n 

v n (t) := ~y^Ei < t,Xi >, 



i=l 



an empirical linear centred process. Then: 

\\P - P {uv) Hrv, < 11/3 - n m /3||^ + pgfi(m) - pen(m^) + 2v n @W - U m P). (11) 
The first step is to replace the random function pen by its empirical counterpart pen^ uv \ this can be done 



by using the results of Lemma 13 in the Appendix directly in Equation ( |11[ ): 

-n m /3||2 +E x [pen(™)(m) -pen(™)(m(™))] 



P {uv) \\k 



+E ? 



2 1 + 



Km 



(uv) 



n 



m\ 
n J 



u n tfW - fl m p) 



+ 



T 2 ' P + 



l + a 2 



n 



(12) 



where k := 2(9 + 1). 

Then the last step consists in controlling the empirical linear process v n on S m \/ m . Remark that for all 
5 > 0, for all m G M n , 



2v n tfW - UmP) < l\\P iUV) ~ UmPWL + B 



sup z^(/), 



(13) 



1 
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since for all x,y <ER and 8 > 0, 2xy < 8 1 x 2 + #y 2 . 

Let p(m,m') := 2(1 + ^) m ^ m a 2 , remark that pen(m) + pen(m') > p(m,m'). Then, since 8 > 4 and 
< n./«, gathering equations (12) and (13) we obtain: 



E, 



(uv) 



+2pen^(ro) + 20E x 



< 2 + 



n 



sup ^(/) -p(m,m (uu) ) 



\ ll/llr„=l 



/ 



Then the last step is to bound the variations of sup f a u n(f) (which can be seen as a variance term) 

ll/llr„=l 

around p(m,m'), the results comes from Lemma n] detailed below: 



i(uv) 



lr„ 



< C(0) min^ / 1|/5 — n 



,,, l | 2 n +pen(™)(m)} + 



C(p,<S) j2 



□ 



Then we conclude as in the proof of Proposition [T] by Inequality (10). 

For sake of clarity, Lemma [TJ which is the key of the previous result, is given below. 
Lemma 1. Suppose that Assumption HI is fulfilled. Let pirn, m') = 2(1 + 5) my ™ a 2 , then for all m E Ai n , 

( \ 



m'EMn 



sup u£(f) - p(m,m) 



All/lkn^ 



/ 



< CM a 2 



n 



The proof of this lemma is given in Section 4.1 and relies on results of Baraud [2j based on Talagrand's 
Inequality. 



2.3 Oracle inequality 

In this section, we derive an oracle-inequality for the risk associated to the prediction error. We define first 
an allipsoid of L 2 ([0, 1]) 

W? := | / € L 2 ([0, 1]), Y,f < M >^ R2 

Theorem 1. Suppose that assumptions HI, H2, H3 and H4 hold and that the decreasing rate of (^j)j>i 
is given by (P) or (E). Then, for all slope function £ L 2 ([0, 1]), if n > 6: 



E\\\ i 3\\f] < d( mm (e[||/3 - n m /3|| 2 ] + E[||/3 - n m /3|| 2 J + pen(m) ) ) + ^ (l + ||/3|| 2 ) (14) 



where C\ > and C2 > are independent of (3 and n. 

If, in addition, (3 S W,r — with the condition a + r > 2 in i/ie polynomial case (P) — we have 

HHP- <C[( mm (E[||/3 - fl m /3|| 2 ] + pen(m))) + ° 2 {l + ||/3|| 2 ) , 

where the constants C[ > and C' 2 > do not depend on (3 or n. 



(15) 
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Remark 1: The condition a + r > 2 is verified as soon as a > 2 without condition on the regularity 
parameter r of the slope f3. Note that if X is a Brownian motion, the sequence (A 3 ),->i associated to the 
process X verifies (P) with a = 2. Then we do not need additional condition on r if X is smoother than 
the Brownian motion. 



Sketch of proof. The core of the proof relies on the bounds on the empirical risk given in Proposition [TJ 
for the known variance case, and Proposition [2] for the unknown variance case. Then it remains to replace 
the empirical risk appearing in propositions [T] and [2] by the risk associated to the prediction error in order 
to obtain the final oracle-inequality. This is done with the results of Lemma [2] which allows to control the 
set 



A n :— {V/ G S n , ||,/ i r 
where oq > 1 is a constant and we define S n := SjC T . 
Proof. The following equality holds: 



< 



}. 



(16) 



n\\p-p\\i 



lrlA„]+E[ 



lr 1 Ac]> 



where, for a set A, we denote by its complement. 

Lemma [4] in Section [4] allows to bound the second term of this inequality. Thus the end of the proof will 
be devoted to upper-bound the first term. 

We remark that, for all m G M n , 



TJ-A,, 



< 



IT 



lrlA„ + 



< V^oll^-^lk + VPol 
By propositions [T] and [2j for all m G M. n '- 



n m /3|| r 
-n m /3||r„ + 



n 



mP r- 



(17) 



E[\\^-f3\\ 2 r J<C(p,e,5,a 2 ,r p )(E[ 



n m /3|lrj+pen(m) + — 



ii 



and by Equation (17) 



t IaJ < C(p, 6, 5, a z ,r p , po) E[||/3 - n m /3||^J + E[\\f3 - U m (3\\ 2 r ] + pen(m) + 



1 + 



n 



and Equation (14) follows. 



Then Equation (15) comes from Equation (14) and Lemma 12 



□ 



2.4 Convergence rates 

As a direct consequence of the oracle-inequality given in Theorem [TJ associated with the control of the 



random projector on the spaces S m given in Lemma 
estimators on the ellipsoids W/*. 
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we derive uniform bounds on the risk of our 



Theorem 2. Assume that the assumptions of Theorem^ are fulfilled. For all r > and R > 0: 
Polynomial case. If (P) holds with a + r > 2 then: 



sup E[ 



\l) < C pn -(^)/(^+l); 



(18) 
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Exponential case. If (E) holds then: 



sup Ep - < C E n- 1 (hin) l / a , (19) 



wt/i Cp and Ce independent of n. 



Remark 2: In the case where the noise e is Gaussian, the bounds (IS) and (19) coincide with the minimal 
bounds given by Cardot and Johannes 111 



Proof. Let us start with the polynomial case (P). By Theorem [lj we have: 

E[0- PWr] <c( min (e[\\(3 - fl m /3\\ 2 r ] + pen(m)) + -(1 + 11 ' 2 

with C independent of j3 and n. Denote by Ii m the orthogonal projector onto S m = Spanj^i, . . . ,tp m }, by 
Lemma [IT] 

11/3 - fimPfr < 2||/3 - U m Ml + d— m n«x{(i-r) + ,2-«-r} + ^ m 5 m In 4 n ^ max{(2 _ a+(7 _ r)+)+i2 _ a+(5 _ r)+} ^ 

n n z 

with C\, C*2 > 0, independent of /3 and n. Now since /? G W^, 

E[||/3 - Mlrl = E A i < ^' ^ >2 ^ m ~ a ~ r E ^ < ^' *j >2 ^ ^ -a_r - 

We can see easily that it is possible to define a sequence of integers (m* ) ng N* such that 

m* x n 1 / a+r+1 and m* < iV n for all n £ N*, 

where for two sequences (afc)fc>i and (6fc)fc>i, we note a& < bk if there exists some constant c > such that, 
for all k < 1, a*; < cfe/t and we note also x if a& < and < a^. Now considerations above lead us to 



sup E[\\f3-n m *J\\ 2 T ]<n s+f+r, 
/3eW« 

as soon as a + r > 2 and in addition 

pen(m* ) < n , 

which leads to the expected bound. 

The exponential case (E) is treated similarly with m* x ln 1//fl n. □ 

3 Numerical results 
3.1 Simulation method 

Following the method proposed by Hall and Hosseini-Nasab [23] , we simulate the random function X in the 
following way 

J 

•V E V A /, ' > (20) 

3=1 

where, for all j > 1, ipj(x) = \/2sm(ir(j — 0.5)x) and j = 1, J} is independent and follows the 
standard normal distribution. This sequence of functions (ipj)j>i has been chosen so that if J is sufficiently 
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high and if Xj = (j — 0.5) 2 tt 2 , we obtain a Brownian motion (see Ash and Gardner pQ). In order to see 
how the decreasing rate of (Xj)j>i influences the estimation, we take three different sequences: 

Af=r 2 , Af = r 3 and Xf = e~K 



It is interesting to note from Equation (20) that the higher the rate of decrease of the Aj's, the better the 
regularity of the function X. 

The function X is then discretized over p = 100 equispaced points 
•fij = ^y^jj = 1, We take e ~ A/"(0, a 2 ) and a 2 = 0.01. We consider here two different slope functions 

Pi{x) = \n(lbx 2 + 10) + cos(4vrx) (see [TO]) and (3 2 (x) = e (a; - a3)2/0 - 05 cos(4vrx). 

3.2 Comparison with cross validation 

We compare our dimension selection criterion with two cross validation criteria frequently used in practice. 
The first method consists in minimizing 

GCV{m) := ^=i( y *- y ») 

(l-tr(H m )/n) 2 ' 

where Yi := Jq j3 m (t)Xi(t)dt and H m is the classical Hat matrix defined by Y = (Yj., Y n )' = H m Y. This 
criterion has been proposed in a similar context by Marx and Eilers [2Z] and in the context of functional 
linear models by Cardot et al. [11]. The second one consists in minimizing the criterion 



1 / x 2 

CV(m) :=i£(V^ )V 



8=1 



which has been proposed in the framework of functional linear model by Hall and Hosseini-Nasab [23] . Here 
Y{ is the value of Yi predicted from the sample {(Xj,Yj), j ^ i}. Note that an immediate drawback of 
this criterion is that it requires a much longer CPU time than the GCV criterion or our penalized criterion. 

3.3 Results 

As we can see in figures [T] and [2j the regularity of estimators increases when the rate of convergence of 
the Aj's decreases. This is a specificity of functional PCA: the estimated slope function /3 is an element of 
Im(r n ) = span{Xi, AgJ and thus has the same regularity as the function X. It also explains the side 
effect observed in figures [l and [2] since X,(0) = implies that /3(0) = 0. 
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I PI 



V P2 



A 



E 






0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.2 0.4 0.6 0.8 1.0 



Figure 2: Plot of fa (bold, dashed) and j3^ v ^ computed for 10 independent samples of size n = 1000. 









Pi 






fa 








n = 200 


n = 500 


n = 1000 


n = 200 


n = 500 


n = 1000 


r 2 


kv 


12.8 ±0.4 


5.9 ±0.2 


3.57 ±0.08 


4.7 ±0.3 


1.88 ±0.09 


0.89 ±0.05 




uv 


12.5 ±0.4 


5.8 ±0.2 


3.51 ±0.08 


4.7 ±0.3 


1.89 ±0.09 


0.89 ±0.05 




GCV 


80 ±2 


55 ±2 


47 ±2 


80 ±2 


55 ±2 


47 ±2 




CV 


12.2 ±0.5 


5.6 ±0.2 


3.34 ±0.09 


5.7 ±0.4 


2.2 ±0.2 


1.08 ±0.06 


r 8 


kv 


6.7 ±0.3 


3.3 ±0.1 


1.84 ±0.06 


5.5 ±0.2 


1.8 ±0.1 


0.88 ±0.04 




uv 


6.6 ±0.3 


3.2 ±0.1 


1.83 ±0.06 


5.4 ±0.3 


1.8 ±0.1 


0.88 ±0.04 




GCV 


18.4 ±0.5 


12.6 ±0.3 


9.3 ±0.2 


18.5 ±0.5 


12.7 ±0.3 


9.5 ±0.2 




CV 


7.3 ±0.4 


3.3 ±0.2 


1.78 ±0.07 


5.1 ±0.4 


2.0 ±0.2 


1.05 ±0.08 




kv 


4.8 ±0.2 


2.05 ±0.08 


1.12 ±0.05 


5.0 ±0.2 


1.9 ±0.1 


0.78 ±0.05 




uv 


4.7 ±0.2 


2.03 ±0.08 


1.11 ±0.05 


4.9 ±0.2 


1.82 ±0.09 


0.77 ±0.05 




GCV 


5.8 ±0.3 


2.67 ±0.09 


1.45 ±0.05 


6.0 ±0.3 


2.7 ±0.1 


1.40 ±0.05 




CV 


4.8 ±0.3 


2.05 ±0.09 


1.10 ±0.05 


4.6 ±0.3 


1.8 ±0.1 


0.87 ±0.05 



Table 1: Mean prediction error (xlO 4 ) 
independent samples of size n = 1000). 
selected by 0. 



and approximated 95% confidence interval (calculated from 500 
kv: dimension selected by minimization of ([5]), uv: dimension 



12 



Estimation of (5\ 





Estimation of 





0.0 0.2 



0.6 0.8 



Figure 3: Left: comparison of estimators (3 m when m is selected by minimization of the penalized criterion 
crit defined by ^ or the CV criterion. Right: comparison with the GCV criterion, n = 2000, Xj = 

The results of Table [l] indicate that the substitution of the term a 2 by the estimator <r^ in case of 
unknown variance does not have a significant effect on the quality of estimation. In fact the Monte Carlo 
study also revealed that the dimension selected by minimization of ^ and ([7]) is the same in 70% to 99% 
of cases (percentage depending on the sample size n, the decreasing rate of the Xj's and the function f5). 

Moreover, according to Figure [3] and Table [TJ performances of our estimators seem to be quite similar 
to the functional PCR estimator with dimension selected by minimization of the CV criterion. Conversely, 
the GCV criterion selects systematically the highest dimensional model which leads to poor performances. 



4 Proofs 

4.1 Proof of Lemma Q] 

Proof of Lemma\^ First denote by /x := (< /, X\ >,...,< f,X n >)' and e :- 
v n{f) = fx^/ n an d that, by usual properties of orthogonal projectors 



(si, E n )'. Remark that 



sup v n {f) = sup — = —= sup OLE = -^(tt$ m (eyn^e) 

a 1 a=n a'a=l 



1/2 



/G5 m 
l/llr„=l 



where s m denotes the subspace of W 1 defined by 

s m := {a G R n , 3f G S m , a = / x } 



and n^ m is the orthogonal projector onto s r , 
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Then, by Assumption HI, applying Corollary 5.1 of Baraud (2000) [2] with A = Hg m we obtain, for all 
x > 0, 

/ \ 



n sup v n {f) > ma + 2a \Jmx + a x 
feSm 
V ||/||r„=l 



< C{p)a^r 1 



m 



/ 



where Px stands for the probability given X. Then for all 5 > remark that \fmx < 5m + 5 'iwe obtain 



sup ^ (/) > (1 + (5) !^ + (1 + r i ) ^ 

n n 



\ 



Set 



fes m 
Vll/llr„=l 



<C(P) 



77? 



p/2- 



m\lm 



J 

( \ 

sup vl(f) - p(m,m!) 



\ll/llr„=l 



/ 



We have, for all m, m' S M. 7 



r+oo 

Ex [QmVm'] = / Fx {QmVm> > *) eft 
JO 



p />+oo 



2 

< Ctp^-trnVm') 1 ^ 2 . 

7? 



As p > 4, (tt? V m') 1 ^/ 2 < 1 and we obtain the expected result. □ 
4.2 Upper-bound of the risk on 

n 

We first bound the probability of A„: 

Lemma 2. Under assumptions H2, H3 and H4 and if the decreasing rate of (^j)j>i is given by (P) or 
(E), the set A n defined by Equation (16) verifies 

P(A^) < Cjn\ 

with C > independent of n. 
Proof. First remark that: 

P(A£) = P(A° n {N n < N n }) + p(a£ n {N n > N n }), 

the second term of this equality is easily bounded by Cn~ 6 by Lemma[5j It remains to bound the first term. 
We have 



A^ n {N n <N n }=\ inf < p Q 1 \ n {N n <N n }c{ inf — ^ < p 



fes n H/iir 



Let / = 5^1=1 "i^j e •S'jVn) we nave a - s - 



3=1 
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where | • (2 is the norm of K w " defined by |aj|| = xj for all x = (x\, x n )' G E^™ and A n is the diagonal 

matrix with diagonal entries { yAi , . . . , V~\~n} ■ 
Moreover 



iv„ 



j,fc=l 

where ^ n is the symmetric and positive-definite matrix 



a, 



l<j,k<N n 



Then, 



Now 



lAnC*! 2 , 



|An^ V2 ^ /2 a| 2 



2 a'^ n a \^ l J 2 a\% 



l^n /2 «ll 



inf 



inf a/ * nl/ ' A ^ nl/2a = min{A, A eigenvalue of tf- 1 /^*"!^}. 



*6R W n\{0} 



a- 



On the set J n defined by Equation (30) in Section A. 2 by Lemma[7j for all, j = 1, ...,N n , Xj > 0. Hence 
the matrix A„ is invertible, therefore 



inf 

feS n 



2 

-f =P 
r 



(^A 2 ^ 2 )- 1 = P (K 1 *nK 1 )- 1 



where, for a matrix A, p{A) = max{|A|, A is a complex eigenvalue of A} denotes the spectral radius of A. 
We have then 

p (a£ n [N n < N n }) < F(j n n MA-^nA- 1 ) > po}) + nJn)- (21) 

By Lemma [9] in Section A. 2 F(J'n^) < C/n 6 , with C depending only on V and 6. Thus it remains to 
control the spectral radius of A~ \l/ n A~ . 

We define a linear (random) application O from ~R Nn to L 2 ([0, 1]) by: 

O : a = (qi, a Nn )' h-> £ ct^. 

We denote by O* the adjoint of O, which is the linear map from L 2 ([0, 1]) to R Nn defined by: 

0*:/^(</,^>l<,-<iV n ). 

We can check that OO* =ILn„ and is the matrix of the linear map 0*TO in the standard basis of R n . 
It is known that the spectral radius of an operator is equal to the spectral radius of its adjoint, then, 

KA^A; 1 ) = piC^O^TOC- 1 ) = piT^OC-'C-'O*^ 2 ), (22) 

where C n denotes the linear endormorphism of R Nn whose matrix in the standard basis is A n . Denote by 
TlN n the orthogonal projector onto Sn„ = spanj^i, ...,ipN n }. Moreover, let (resp. Tn) the pseudo-inverse 
of operator T (resp. T n ) on Sjy n (resp. S n ), defined by: 



rV:=£ 



-ipj and rlf := £ 
Aj 3=1 X 



< /> $3 > 1 



4,A 



(23) 
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we have T^T^T 1 / 2 = U Nn and OC~ 2 0* = T ] n . Then, 

r^oc^c^o*^ 2 = r^ri r 1 / 2 = r 1 / 2 ^ + rt - rt)r 1 / 2 = % + v 1 ' 2 ^ - r^r 1 / 2 , 



and by Equation ( 22 ) 



/ 2 I 



ptA^A; 1 ) = ||% n + r^rt - < 1 + yr 1 / 2 ^, - r^r 1 

where || • ||oo denotes the usual operator norm. 
Now 

¥{Jn n MA^A; 1 ) > po}) < p (Jn n [\\t 1/2 {tI - rt)r 1 / 2 || oc > Po _ i}) . 

Thus, the results of Lemma pi - whose technical proof is given in Section A. 5 - in Equation (21) allows us 
to bound P ^A^ n |iV n < iV„j^. Then the proof is finished by Lemma j^J □ 

Lemma 3. Suppose that assumptions H2, H3 and H4 are fulfilled and that the decreasing rate of (Xj)j>i 
is given by (P) or (E), then 



j n n {lir 1 / 2 ^ _ rt)r 1 / 2 || oc > Po _ i}) < C n 



with C > independent of n. 

Lemma 4. For all (3 £ L 2 ([0, 1]), if assumptions of Lemma^ are fulfilled then 

E[0-p\\ r i Ai \<^(i+m\ 2 ), 

with C > independent of f3 and n. 

Proof. First remark that, as Yj =< (3, X{ > +£i, for all m 6 A4 n 

1 ^ v <X i ,i; j > 



Z — ^ Z — ^ \ 



n 

j=l i=l 



A 



where 

Then 

E[ 



m -, n 



^ m : - E ^ E 



1 . < X h^3 > 



n 

j=i i=l 



i>5- 



|rl A c] < 2E 



|fl A /3 - /3||^l A c 



+ 2E 



l^m||rlA c 



<2||/3||2p(A£) + 2E 



|-^rh||rlA c 



The first term can be easily bounded using the results of Lemma [2j Then we focus on the second term, the 
idea is to bound the quantity ||i2m||r by ||-Rm|| which can be written simply, 



E 



I -f^rri 1 1 r 1 a c 



< p(r)E 



|-Rm|| l^C 



P (r)E 



E < R rh, 4>j > 2 1 A C 



Now remark that, since m < N n < 20-^/n, 

rh if i / , n -y ? 



20 IV™] 



i=i 



j=i \ i=i 



A, 



^ E ^E 



i <x i ,4> j > 



i=i \ i=i 



L {A,->Q}- 



Then we have, by independence of Si with Xi and A r 



E 



l-Rmllr-'-A 1 ' 



< 20p(r)^ S ; 1 P(A^), 



and the results come from lemmas [2] and [5] and the definition of s r , 



□ 
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Lemma 5. If Assumption H2 is fulfilled and n > 6, then 

P(N n > N n ) < Cn~ e , 

with C independent of (3 and n. 

Proof. The sequence (Xj)j>i being non-increasing we have 

P(Nn > N n ) < P(X Nn+ i > A^J < P ({\ Nn+ i > A^ } n A n ) + P (4) , 



where A n is defined by Equation (26) in Section A. 2 Then by definition of A n , 

6* 



{x Nn+1 > A^ n } n An c {a^+i > Xf, n - ^} 



A - —2 

since Aat 71+ i < n -2 , Ajv„ > Sn and 5^ < < Thus the proof is finished following the conclusions of 
Remark [3] in Section [A. 21 □ 
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Appendix A Perturbation theory background 

Many intermediate results are based on perturbation theory. We give in this section some preliminary 
results on this subject. The aim is to control the proximity between the random space S m spanned by the 
eigenfunctions of F n and the space S m spanned by the eigenfunctions of T. 

Recall that IL m (resp. fl m ) denotes the orthonormal projector onto S m (resp. S m ) and ttj (resp. ttj) 
denotes the orthonormal projector onto span{^} (resp. spanjV'j}). We write the difference of projectors 
n m — n m (or equivalently — ttj) explicitly in terms of the operators difference r — T n easier to handle. 

A.l Exponential inequalities 

In the proofs, we use the following version of Bernstein's Inequality: 

Lemma 6 (Birge et Massart (1998) |5j). Let Z\, Z n be independent random variables satisfying the 
moments conditions 



1 ! 

- J>[|Zf|] < ^v 2 c m - 2 , for all m > 2, 
i=i 

for some positive constants v and c. Then, for any positive e, 

P (i± (Zl - m]) >s)<^(^). 
We use also a version of Lemma [6] for Hilbert valued random variables which can be found in Bosq [6] . 
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2Ai 



7 





-2Ai 



Am+l 



Ai 



2Ai 



Figure 4: Rectangular contour 











/ 2^2 \ 




' 2b x \ 







V Afe / 




V A 2 j 







Figure 5: Contour made of disjoint circles 

A. 2 Preliminary notions 

Let 7 be either the rectangular path given by Figure [4] or the union (for j = 1, ...,m) of the circular paths 
dVlj of center \j and radius 8j represented in Figure [5j 
We have, for all j > 1, x € [0, 1]: 

n m Vi(x) = ij< m ^(x) = / ^^dc = -L /(CJ- r)-Vj(*)dC, 



and 



2wT J 7 C — Aj 2i7T 



(24) 



we refer to Chapter III of Dunford and Schwartz |18j for an exact definition and properties of this integral. 

Now the aim is to write similarly the random projector tl m . This can be done if, for all j < m, Xj is in 
the interior of 7. 

For t > 0, let J" 7 (t) be the set 



Jj(t) = i sup [|T(C)[| DO <t 

^C6supp(7) 



(25) 



where T(() := i? 1/2 (C)(r n - r)^ 1 / 2 ^) with = (C* - r)" 1 . Define also the set 

A:=n{fe-A,|<|}. 

Lemma 7. Ze£ i < 1/2, i/ien /or both circular and rectangular path 7 Sy(t) C -4.„. 

The proof of Lemma [7] may be deduced from the proof of Lemma 14, p. 12 of Mas and Ruymgaart |28j . 
Now Lemma [7J allows us to write that for all t < 1/2: 



(26) 



fi m = i Jl{t) ^-J(Ci-r n )- 1 dt. 



(27) 
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Define R(() := ((I - r n )~\ equations Q and §27§ lead to: 

1 



(n m - U m )l Ji(t) = V 7 (t)^ y (fl(C) -R(0) d(, for all f < 1/2. 

Then we rewrite the interior of the last integral by remarking that 

R(Q-R (C) = R(()(T-T n )R (C) = R (C) (CI - r) 1/2 T (C) R 1/2 (C) 



(28) 



By definition, when t < 1, on the set J^(t), the operator / — 7~(C) is invertible for all £ 6 supp(7) and 
we have: 



(/-r(0)^ = (a-r) 1/2 ii(c) (a-r) 



vl/2 



Then Equation (28) leads to 



fi (0 - a (C)) ijL (t ) = ^ 1/2 (0 [/ - T (CT 1 T (C) i? 1/2 (C) U 



(*)■ 



We obtain in a similar way a rewriting of (jtj — vrj)l j ^ or (rt — j m where and T„ are defined 
by Equation (23). All results are summarized in the following lemma. 

Lemma 8. For all t < 1/2 if 7 is either the union of circular contour represented in Figure [3] or the 
rectangular contour given by Figure^- 



ft - Vj )i Jjit) = ±-f R^mi-ntT'noR^iQdaj. 

J l&t* (o [i - r (on 1 r (o r 1/2 (o da Mt) . 



7 (t) 
7 (t) 



(r f -rt)i. 



(*) 



1 /" 1 

2Z7T 



The last lemma allows us to finally control our quantities on J^(t)^ . 
Lemma 9. Denote by 



A, 



+ — -, /or all k >1. 



If j is the path d£lk covering the circle of center A& and of radius 5k or the rectangular contour given in 
Figure^ we have, under Assumption H2 : 



p(.7 7 C (*)) <2exp 



2a|(26 - 1) (26 - 1) + 256b 3 /((2b - !) a fc* / ' 



The proof of Lemma [9] relies on Bernstein's exponential inequality for Hilbert- valued random variables 
(see for instance Bosq [6]). Details can be found in Mas and Ruymgaart [28, Lemma 13]. 



Definition 1. Let, for all t > 0, J^it) be the set defined by Equation (25) then we define a set J n in the 
following way: take lj^ n := min Inn, 1/2 1 , 

Circular contour 

m 

Jn-=f]Jda j {lj,ny, (29) 

3=1 

Rectangular contour 



(30) 
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Remark 3: By Lemma |9j the sets J n defined by Equation ( |29| ) or (30) verifies 

P(Jn) < 2exp(-c*ln 2 n), 
where c* > depends only on b and eg. Moreover, as lj jTl < 1/2, by Lemma [7| we have 

and the results of Lemma [8] are true. 



(31) 



A. 3 Upper-bound on the distance between empirical and theoretical projectors 

We need a preliminary lemma 

Lemma 10 (Hilgert et al. [23], Lemma 10.1). If Assumption H4 is verified, then for all k G N* : 

&k < C(j)klnk 

Lemma 11. Let r,R> and f3 £ W/*. Suppose that assumptions H2, H3 and H4 are fulfilled. If {\j)j>i 
decreases at polynomial rate (P) then 

E[||n m /3 - U m f3\\ 2 r l Jn ] < d— mm a X {(i-r) + ,2-a- r} + ^ jg° ™ In 4 n mmax{(2 _ a+(7 _ r)+)+i2 _ a+(5 „ r)+} 



y2 2~ 



and if (Xj)j>i decreases at exponential rate (E) 



„ „ „,,9 n „ In 3 m ft „s ln 6 mln 4 n 

E[||n m /3 - n m /3||2i Jn ] < c 2 ^^m( 1 - r )+ + 2 — . 



n 



roitA C\ > 0, C 2 > and C3 > depending on r, R and on the sequence (^j)j>i but are independent of m 
and n. 



Proof. First, by Equation (31) 



E[||n m /3 - iw?!!^] < 2\\p\\lnJn) < c\\p\\l/n\ 

with C independent of (3, n and m. Now by Lemma [8j 

1 m r 

(tl m p - U m (3)l Jn = lj Rl/2 & \ J ~ T (*)] n (z) R 1 ' 2 (z) (3dz 

j=l J dSlj 

Remark that (/ - T{z))- l T{z) = T(z) + (I - T '(z))- 1 T 2 \z) , then 

ijn fn m — n m ^ = A n + B n , 

with 



dz 



1 / r 

A n = \j—Y J \ {zI-TY\Y n -T){zI-Y)- x 

m 

B n = r J —Y J \ (zi-ry^ii-TizT'Tizfizi-r) 



-1/2 



dz. 
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We deal with A n first. Calculations show that 
E 



a? a 2 . 



j=l k>m ( A i ~ A ^)' 



J=l fc>m \ 3 



(\j — x k y 



j'=i 



?i 



. =1 (Aj - A m+ i) 



k>m 



I rrS 1 r )+ In 3 m m 2 r In 2 m 

< C 1 A m+ i 

\ n n 



(32) 



as soon as {Xj)j>\ decreases exponentially or polynomially with C > depending only on R, r, a and T. 
The last line comes from the inequality Xj < j _1 and Lemma 10 We turn now to B n 



\ B n/31j n \\l 



(33) 



^n7^E A ME/ <« 1/2 (c)[^-r(c)]- 1 r(o 2 ii 1/2 (c)/3,^>dC) ■ 

fc>i yj=i^ 9 ^j J 

We have: 

< i? 1 / 2 (o [i-t (or 1 r (c) 2 r 1 ' 2 (c) ^ >=<[/- r (or 1 r (c) 2 ^ 1/2 (0 p, r 1 ' 2 (0 ^ > • 

We denote by /3° := Y^j>i3 r fij^j ; as /3 is in W^, the function /3° is in L 2 ([0, 1]). Moreover we denote by 
P r the diagonal compact operator defined by P r ipj = j~ r ^ 2 ipj, we remark that j3 = We have 



R l/2 (C)^k = (Ci-T)- 1 ^ k 



l 



Then, 



< r 1 ' 2 (C) [J - T (C)r 1 n (C) 2 i? 1/2 (0 /3, ^ > I < 



\(i-no)- l \L\\no\L r 1/2 (()p< 



Now on the set by definition, we have for all £ G fij: 

ll( J -7*(C)r X |L < 2 and ||T(C)IL < ^= ln «- 
Moreover, the eigenvalues of the operator E}/ 2 (^)P r are {k" r ^ 2 (( — Xk) l ^ 2 ,k > 1} then, for all, £ 6 (9f2j : 

(34) 



R^ 2 {c)p r = sup{fc-/ 2 ic - A fc r 1/2 i = j- r/2 /V* 

oo k>l 



J- 



and Equation (33) becomes: 



I . / "L a 2 7 -r/2 

1 ^ - I *j , 2 7 



n- 



fc>i \j=i 



j -/an. 



\/k - A fc | 



In the polynomial case (P), calculations lead to the following bound 

In 5 m In 4 n 



\B n f31 Jn r r <C- 



-m 



max{(2-a+(7-r) + ) + ,2-a+(5-r) + } 



11^ 
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with a > 1 such that Xj X j~ a and C > depends only on i?, r, a and T. Gathering with (32) we obtain 
the expected result. 

In the exponential case (E) we have 



in fl1 ,i2 < w ln 6 mln 4 ra 



?1- 



with C depending only on R, r, a and T. 



□ 



A. 4 Empirical and theoretical bias terms 



Lemma 12. Suppose that assumptions H2, H3 and H4 are fulfilled and that /3 6 W,r with r,R>0 such 
that, in the polynomial case (P), a + r > 2. Then for all m = 1, ...,N n : 



E[\\J3 - fl m /3||2] < 4E[||/3 - U m (3\\ 2 r ] + r m , n , 
where T m ^ n < me{m)/n where e(m) — > u>/ien m — >■ +oo and is independent of j3 and m. 



(35) 



Proof. First imagine that the random projectors in the equation above are replaced by non random one. It 
is elementary to see that 



E 



n 



mHUn 



n 



mh>\\T 



E 



n m - n w /3 



and that consequenlty to get (|35|) it is enough to show that both E 

2 



Ilm Ilm 



and 



and non-asymptotica 
Specifically these authors get 



are bounded by T m ^ n . The first bound was proved asymptotically in Cardot et al. |13j 
ly in the Proposition 20 of Crambes and Mas |17j in a slightly more general framework. 



E 



< A 



m 2 \ m 



n 



n m - n m j p 

where A does not depend on m and n. The only point left is to prove the same sort of bound for 

2" 



E 



Hffl Hjn ) P 



The derivation makes use of perturbation methods already used in other parts of proofs. We will skip 
technical details to concentrate on the essential facts. 
In a first step remark that 



n m -n m )p 



(r n -r) n m -n m )p, n m -n m )p) + 



n, 



n m )p 



and it is enough to focus on the first term and to prove the bound for 



E 



(r n -r) n m -n w )p 



after a Cauchy-Schwartz's Inequality coupled with the fact that 
Now by Lemma [8] 



(n m - n m ) p 



< 2 



(r n - r) (n m - n m ) pi Jn = ^E/(r„ - r)^ 1 ^ (c)[7 _r(C)]-Y (t)R 1/2 (CWci Jn . 
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Hence by definition of J n , ||(J - 7"(C)) _1 ||« S ol^„ < 2 and HHOHool^ < ^ then 



E 



(r n -r) n m -n m )p 



ljn 



7T ^ 

3=1 



E 



an, 



(r n - r) fl 1 / 2 (0 IITCOIIooVnl ^ 1/2 (0/3 



i //t 

Inn v — v 

3=1 



an, 



< 



Inn 



TT\/n 



^11 £' 



R 1/2 (C)P, 

; -r/2 



/3 ( 



E 



(T n -T)R^ 2 (C) 



d( 



d( 



3=1 



oj Jan 



E 



\(T n -T)Ri/2(()\\ 2 HS l Jn d( (36) 



where we recall that, by Equation (34), 



r 1/2 (cm 



< 



-r/2 



(the definitions of /3° and P r are given in the proof of Lemma 11). 
Treating E 



C g an,- 



(r n — T) R 1 / 2 {Q\\ HS with computations similar to those carried previously we get, for all 



E 



(r n -r) r 1 ' 2 (0 



HS 



< 



c' aj 



n 



with C' = Tr(r) max{l, 6—1} and putting into Equation (36) we obtain 

C Inn 



E 



(r„-r) n ro -n m )p 



< 



irn 



3=1 



Considering again two cases related to the rate of decrease for the eigenvalues we see first that for an 
exponential decay the term above is bounded up to a constant by (In n)/n. Secondly in case of polynomial 
decay we get : 

c ' lnn p°\\ jrf /2 r r/2 r {1+a)/2 \n 3/2 j 



E 



(r„ - r) n m - n w \p 



< 



< 



irn 

C Inn 

ttti 



3=1 

m 



and In n In 5 1 2 m ■ m 2 

Thus the proof is finished by Lemma [9] 



in ■ iir* ( r + a )/ 2 jn = o(m/n) when r + a > 2. 



E 



n, 



n m )/3 



L 7 C 

yJ n 



<w\\lnj z n)< c " {h,T) 



□ 



A. 5 Technical part of the bound on P(A„) 

Proof of lemma^ Let 7 be the contour defined by Figure [4] with m = N n . 

We have by Lemma[8]and the fact that {I - T{z))~ 1 T{z) = T(z) + (I - T(z)y 1 T 2 ( 

rv« [rt - rt] rV» ljn = ljn _L j i r vw (2 ) {I _ T {z)]^ t{z)rV 2 W 

= lj n ^ / -T l l 2 R{z) (T n - V) R{z)T l ' 2 dz 

+ lj n ~ [ ^R^izW-Uz^T^R^iz^dz. 

2llT J 7 Z 

Now, we consider separately the two decreasing rates of the Aj's. 



I 2 dz 



(37) 



(38) 
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Exponential decrease : Aj x exp(— j a ), with a > By Equation (37) and the fact that 
T(z))~ 1 \\ 00 < 2, on the set J n 



ir^rt _ rt)r 1 /2|| 00 i^ i < _L / ir 1 /^^) [7 _ r{z)] -i T {z)R l i 2 {z)Y l i 2 dz 



For 2; G supp(7), the eigenvalues of the operator T l / 2 R l / 2 (z) are 
{A^-A^/^l^then 



T^R 1 ' 2 ^ 



= sup 

00 j->j l |Z — Aj 



and 



rf2 



< C + 2 T^<C, 
Jo 1 + u 2 



where C and C are independent of n and the last inequality comes from the fact that in the exponential 
case, there exists a constant c > such that 5/v„/Ajv n — c - Then by lemmas [9] and 10 



HJn n {lir^crt - rt)!^^ > Po - 1}) < p [IIT(*)U > - i?j < 2ex P (-c' ^jvj 



with c* independent of n. The result comes from the fact that N n < 20-i/n/ln n. 



Polynomial decrease : Aj >c j a , a > 1 Denote by Ti and T2 the two terms of Equation (38) i.e. 
Ti = l Jn ^ fh l / 2 R(z)(T n -T)R{z)T 1 l 2 dz, 
T 2 = l Jn ~ [ 1 T 1 / 2 R 1 / 2 (z)[I-T(z)]- 1 T 2 (z)R 1 / 2 (z)T 1 / 2 dz. 



First we control T2, the proof in the exponential case leads us to: 



ra^TT-'sup \\T(Z) 
z£7 L 



00 



1 



T^R^iz) 



dz, 



and 



T l / 2 RV\z) 



2 f 2X i/ S N„ 

dz < C + 
00 Jo 



\ Nn du 



<C+ ^—du + 

Jo A N n ~ 0N n Jl 



^N n -5 Nn ) 2 + 5 2 Nn u 2 VT+^ 
\N n du 



(X Nn -5 Nn ) 2 + 5 2 N u 2 VT+^ 



r2Xi/S Nn j 

1 1 -r-f / < C'ln(N n ), 

VI + n 



with C, C > independent of n. Then lemmas J^J and 
P(||T 2 |U > (po - l)/2) < P (c"ln(iV n ) sup \\\T (z)^ 



10 



and the fact that N n < 20 a /n/ In n lead us to 



> 7r(po - l)/2 < exp -c 



n 



N 2 \n\N n ) 



< C"n-\ 
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with c** and C" independent of n 

Now, we can calculate explicitly the term T\ 



-It. 



Xk v A 



7 i,fc>i 



2(2 - Ajfc)(« - Aj) 



7Tfe(r n - r)7r,-dz. 



By the Residue Theorem 



dz 



2m J 7 2(2 — Afc)(z — Xj) 



1 



if k < N n < j, 



Afc (Afc— Xj) 

otherwise. 



Then 



N„ 



Nn 



j,k=l V X 3 X k j= l k>Nr V A i( A i ~ 



7TA:(r n - T)7Tj 



N n 



+ ,S„Sv^(A fe -A,) 



"Ijn E 



N„ 



j,k=l 
3+h 



Afc V Aj i=1 



Aj v Aj 



where Sj ■= Yl 



A fc 

fc>AT„ Aw -Afc 



7Tfc. The term T disappears because vr,T7Tfc = if j ^ k. 



We control separately the operators T[ and T" . We have: 



. n 

p,g=l \ i=l 



where we recall that £p =< Xi,ip p > /y/X p . Then 



P(||T 1 / || 00 >(po-l)/4)<P 



/ 

£ few >^ 

p,g=l \ i=l / 



N n 

. P>9=1 



1 n 



i=l 



> 



PO ~ 1 

47V„ 



For all p ^ q, the sequence of random variables = 1, is independent and centred and by 

assumptions H2 and H3, 

E[\M m ]<mlb m -\ 



then Lemma 6 and the condition N n < 20i/n/ln n implies 



P Moo> 



PO - 1 



•£2 \^cxp(-C^) <^n" 6 , 



with C 



/ _ (po-1) 2 



1 32(26+(/90-l)/4) 



and Cn depends only on b and po- 
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We deal now with the operator T" , we can rewrite it like an array of independent random variables 
with values in H, the set of the Hilbert-Schmidt operators of L 2 ([0, 1]) equipped with the usual norm 
\\T\\ 2 HS = E P , q >i<T^ P ^ q > 2 ,i.e. 

-. n N n / \ -i n 

T " = - E E s ^ ® + J 7r Xi ® s ^ = » E z - 

n i=i j=i V V A i V A i / n i= i 



where, for all f,g,h G L 2 ([0,1]), f ® g h :=< f,h > g. In order to apply the exponential inequality for 
centred Hilbert valued random variable given in Bosq [U Theorem 2.5], we have to find two constants B 
and c such that 

n\\Zi\m s \<{m\/2)B 2 c m - 2 . 

We compute first Uglify 5 



N„ 



17.112 < o / I n i 



2 E<£ 



\2 



p.9>i i =1 

Now, by assumptions H2 and H3, 



x t ® Si x, ) > 2 = 53 £ ^ (#e« 



E[||Z,[|S fl ] < MZ&f 2 



N„ 



1/2 



e e n (\ _ \ \2 



E 



2V„ 



< * EE 



v pi,...,p m =l qi,...,q m >N n j 
m/2 

A 2 



E 



p=l q>Ar n 



(a p - x q y 



We apply then Theorem 2.5 of Bosq [6] with B 2 = 2b 2 J2p=i E 9 >iv n (A 1\ p and 

A2 \ 1/2 



c = MEp"=iE 



9>iVn (A p -A 9 )2 



and obtain, with the condition N n < 20-v/n/ln n, 



|rn| 00 >^- i )< ip (ll^ , ks> 



^-^<2expf-^)£r^-- 



where C" := (po — I) 2 / (2<56 2 + \^28b(po — l)/4), depends only on po and (5 where 8 > depends only on 
the sequence (Aj) 3 ->i and verifies, for all p < N n , 



q>N n 



\ 2 



(Ap - Ag)' 



< AW/2- 



Appendix B Control of pen in the unknown variance case 

Lemma 13. Under Assumption HI, set k := 29(1 + 28), we have, for all m G A4 n 



E x (peh(m) - pen (m,) (m))l G ] < k 



m , 



n 



and 



□ 



(39) 



Ex[(pen(™)(mM) - pen(m( w )))] < -E x [2m^V n (n m /3 - /L (w) )] + 



n\/n 



Var(e 2 ) + 2||/3|| r a . (40) 
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Proof. By definitions of = j n (Pm) and m , we have < j n (IL m /3), then 



E x [(pm(m) - pen^(m))] = «-E x [(<^ - a')] < K-E x [(7n(n m /3) - a% 



m 



n 



n 



Now, by independence of £j with < (3 — tl m f3, Xi >, 



E X [(7n(nm/3)-fT 2 



E-i 



-S2(Y t - <U m f3,Xi >) 2 -a 2 
i=l / 
1 n 

- ^ e 2 - 2e t < (3 - II m /3, X, > + < p - U m p, X { > 2 -a 2 



i=i 



E x [</3-n m /3,X 4 > 2 ] =E X [ 



n 



and Equation (39) follows. 
Likewise: 



Ex[(pen(™)(m(™))-pih(m(™)))] = ^Ex[m (u "V - a 2 ^)] 



(E x [m W (^ -? 2 )] "Ex 



m 



(uv)\ 



< ~(E x [m( w? V - a 2 )] + 2E x [mH, n ( /3 _ 

< ^(E x [m(™)(a 2 - 5 2 )] + 2E x [m ( ™W(/3 - ft m /3) + m^V n (n m /3 - /3 a(m) )]) 
with a 2 := ^ By Cauchy-Schwarz's Inequality 

E x [fh<«V - a 2 )] < A> n E x [(a 2 - a 2 ) 2 ] 1 / 2 = nJ 1 £ E x [(, 2 - a 2 ){e 2 - a 2 )]\ = A 

and, since the £j's are independent of the Xi's and by consequence of II m , we have: 
E x [mu n (p-f[ m (3)} < iV n Ex[^(/3-n m /3)] 1/2 

N ( n 

< —\ E x[£n^ 2 <P~ timf3,X h X p - fl m P,X l2 >} 
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< ^Eni/3-n^ii 2 ]V*<^ ff 
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